Classification of Categorical Time Series Using the Spectral Envelope and Optimal Scalings

This article introduces a novel approach to the classification of categorical time series under the supervised learning paradigm. To construct meaningful features for categorical time series classification, we consider two relevant quantities: the spectral envelope and its corresponding set of optimal scalings. These quantities characterize oscillatory patterns in a categorical time series as the largest possible power at each frequency, or spectral envelope, obtained by assigning numerical values, or scalings, to categories that optimally emphasize oscillations at each frequency. Our procedure combines these two quantities to produce an interpretable and parsimonious feature-based classifier that can be used to accurately determine group membership for categorical time series. Classification consistency of the proposed method is investigated, and simulation studies are used to demonstrate accuracy in classifying categorical time series with various underlying group structures. Finally, we use the proposed method to explore key differences in oscillatory patterns of sleep stage time series for patients with different sleep disorders and accurately classify patients accordingly.


Introduction
Categorical time series are frequently observed in a variety of fields, including sleep medicine, genetic engineering, rehabilitation science, and sports analytics (Stoffer et al., 2000).In many applications, multiple realizations of categorical time series from different underlying groups are collected in order to construct a classifier that can accurately identify group membership.
As a motivating example, we consider a sleep study in which participants with different types of sleep disorders are monitored during a night of sleep via polysomnography in order to understand important clinical and behavioral differences among these sleep disorders.
During sleep, the body cycles through different sleep stages: movement/wakefulness, rapid eye movement (REM) sleep, and non-rapid eye movement (NREM) sleep, which is further divided into light sleep (S1,S2) and deep sleep (S3, S4).Our analysis focuses on two particular sleep disorders, nocturnal frontal lobe epilepsy (NFLE) and REM behavior disorder (RBD), for which differential diagnosis is especially challenging due to a significant overlap in their associated clinical and behavioral characteristics (Tinuper and Bisulli, 2017).For example, NFLE and RBD patients both exhibit complex, bizarre motor behavior and vocalizations during sleep.However, we posit that differences in sleep cycling behavior may still exist due to fundamental differences in the sleep disruption mechanisms of NFLE and RBD.The goal of our analysis is to investigate potential differences in sleep cycling behavior for NFLE and RBD patients and use this information to accurately classify patients accordingly.This data-driven classification can potentially improve accuracy in differential diagnoses of NFLE and RBD in patients presenting clinical and behavioral characteristics common to both conditions.Figure 1 displays examples of study participants' full night sleep stages series from two different groups.
In the statistical literature, classification methods for multiple real-valued time series have been well-studied; see Shumway and Stoffer (2016) for a review.However, classification of categorical time series has not received much attention.The majority of statistical methods for categorical time series analysis have been developed for analyzing a single categorical time series.Some examples include the Markov chain model of Billingsley (1961), the link function approach of Fahrmeir and Kauifmann (1987), the likelihood-based method of Fokianos and Kedem (1998), and the spectral envelope approach for analyzing a single time series introduced in Stoffer et al. (1993).A comprehensive discussion of this research direction can be found in Fokianos and Kedem (2003).More recently, Krafty et al. (2012) introduced the spectral envelope surface for quantifying the association between the oscillatory patterns of a collection of categorical time series and continuous covariates.However, it is not immediately useful for classification.To the best of our knowledge, this article presents the first statistical approach for supervised classification of multiple categorical time series.
In the computer science literature, however, many methods have been developed to classify string-valued time series, which can also be used for classification of categorical time series.These include the minimum edit distance classifier with sequence alignment (Navarro, 2001;Jurafsky and Martin, 2009), Markov chain-based classifiers (Deshpande and Karypis, 2002), the Haar Wavelet classifier (Aggarwal, 2002), and the state-of-the-art sequence learner that uses a gradient-bounded coordinate-descent algorithm for efficiently selecting discriminative subsequences and then uses logistic regression for classification (Ifrim and Wiuf, 2011).These methods are black-box in nature and offer little help in understanding key dif-ferences among groups.On the other hand, the proposed method addresses the classification problem using the spectral envelope and optimal scalings, which provide low-dimensional, interpretable summary measures of oscillatory patterns and traversals through categories.
These patterns are often associated with scientific mechanisms that distinguish different groups and also produce lower classification error compared to state-of-the-art computer science methods like sequence learner.
Many classifiers for real-valued time series rely on feature extraction, a process in which low-dimensional summary quantities are constructed that capture essential features of the underlying groups.These quantities are then used to develop feature-based distance measures, such as the Kullback-Leibler distance and squared quadratic distance, which can be used to measure differences between groups and time series of unknown group membership.
Training data can then be used to estimate group-level quantities and construct a classifier that minimizes the distance between time series and their predicted group (Huang et al., 2004;Shumway and Stoffer, 2016).This type of approach cannot be easily extended to the classification of categorical time series due to the difficulty in obtaining low-dimensional features.To this end, we propose using the spectral envelope and its corresponding set of optimal scalings (Stoffer et al., 1993) as low-dimensional, interpretable features for differentiating groups of categorical time series.Use of these features is motivated by noticing that most categorical time series can be represented in terms of their prominent oscillatory patterns, characterized by the spectral envelope, and by the set of mappings from categories to numeric values that accentuate specific oscillatory patterns, characterized by the optimal scalings.
For example, Figures 2(a The proposed method is briefly described as follows.For each time series to be classified, we represent it as a vector-valued time series through the use of indicator variables.The smoothed spectral density matrix of this vector-valued time series is then obtained, and the spectral envelope and optimal scalings at each frequency are computed from the estimated spectral matrix.Then, the spectral envelope and optimal scalings for each group are estimated respectively via training data.The proposed feature, which is used to estimate the distance from each group, is obtained by adaptively summing the differences in the spectral envelope and optimal scalings.Finally, time series with unknown group membership are assigned to groups with the most similar features (i.e.minimum distance).Under the proposed framework, we show that the misclassification probability is bounded as long as the spectral density matrix estimator is consistent.The procedure is demonstrated to perform well in simulation studies and a real data analysis.
The remainder of the paper is organized as follows.Section 2 provides definitions of the spectral envelope and optimal scalings and corresponding estimators.Section 3 introduces the proposed classification procedure and its theoretical properties.Section 4 provides detailed simulation studies, which explore the empirical properties of the proposed method and compares with the state-of-the-art sequence learner classifier.Section 5 details the application of the proposed classifier to the analysis of sleep stage time series to better understand and accurately classify sleep disorders.Section 6 provides some closing discussions and impactful extensions of this work.
2 The Spectral Envelope and Optimal Scalings
We assume that X t is stationary such that {X 1 , X 2 , . . ., X t } d = {X 1+h , X 2+h , . . ., X t+h } for h ≥ 0 and inf =1,2,...,m P(X t = c ) > 0 so that there are no absorbing states.In order to obtain a quantifiable measure of oscillatory patterns for categorical time series, a typical way is to consider a real-valued time series, X t (β), obtained by assigning numerical values, or scalings, to categories such that We assume that X t (β) has a continuous and bounded spectral density Let V x (β) be the variance of the scaled time series X t (β), the spectral envelope is then defined as the maximal normalized spectral density, f x (ω; β)/V x (β), among all possible scalings not proportional to 1 m at frequency ω, where 1 m is the m-dimensional vector of ones.
Scalings that assign the same value to each category are excluded since V x (β) is zero and the normalized power spectrum is not well defined.Formally, we define the spectral envelope and set of optimal scalings for frequency ω as respectively, where {1} is the subspace of R m that is proportional to 1 m .The spectral envelope, λ(ω), is the largest proportion of the variance that can be obtained at frequency

Computation Through Reparameterization
A common approach to the analysis of any type of categorical data is to represent it in terms of random vectors of indicator variables.Similar to the formulations used in Stoffer et al. (1993); Krafty et al. (2012), we define the (m − 1)-dimensional stationary time series Y t , which has a one in the th element if X t = c for = 1, . . ., m − 1 and zero elsewhere.
This representation is equivalent to setting the category c m as the reference category and restricting the set of optimal scalings to a lower-dimensional space.The assumption that f x (ω, β) is continuous is necessary and sufficient for ensuring that Y t has a continuous spectral density, which is defined as The spectral density f y (ω) is a positive definite Hermitian (m − 1) × (m − 1) matrix.We assume f y (ω) and the variance of Y t , V y = Var(Y t ), are non-singular for all ω ∈ R (Brillinger, 2002).Formally, we define the spectral envelope and the corresponding set of optimal scalings used in our proposed classification algorithm as follows.
Several aspects of the definition should be noted.First, since the spectral density matrix is complex-valued and Hermitian with a skew symmetric imaginary component, for every a ∈ R m−1 , we have a f y (ω)a = a f re y (ω)a, where f re y (ω) is the real part of f y (ω).Thus, the spectral envelope is equivalent to the largest eigenvalue of h(ω) re = V −1/2 f re y (ω)V −1/2 .Second, a connection between the optimal scalings derived from this formulation and that defined in Section 2.1 can be established (Krafty et al., 2012) .
When the multiplicity of λ(ω) as an eigenvalue of h re (ω) is one, there exists a unique γ(ω) is an eigenvector of γ(ω) associated with λ(ω) where γ(ω) V y γ(ω) = 1 and with the first nonzero entry of V 1/2 γ(ω) to be positive.Third, if there is a significant frequency component near ω, then λ(ω) will be large, and the values of γ(ω) are dependent on the particular cyclical traversal of the series through categories that produces the value of λ(ω) at frequency ω.

Estimation
Consider a realization of a categorical time series, X t , t . . ., T , and its corresponding multivariate process realization Y t , t . . ., T defined in Section 2.2.Let fy (ω) represent the estimate of the spectral matrix f y (ω).To allow for asymptotic development, we assume Y t is strictly stationary and that all cumulant spectra, of all orders, exist (Brillinger, 2002, Assumption 2.6.1).There is an extensive literature on estimation of the power spectral matrix.We use periodograms, or sample analogues of the spectrum It is well known that the periodogram is an asymptotically unbiased but inconsistent estimator of the true spectral matrix.A common way to obtain a consistent estimator of the spectral matrix is to smooth periodogram ordinates over frequencies using kernels (Brillinger, 2002).In this paper, we consider the smoothed periodogram estimator where ω s = s/T for s = 1, . . ., K = (T − 1)/2 are the Fourier frequencies, 2B T + 1 is the smoothing span, and W B T ,j are nonnegative weights that satisfy the following conditions: Generally, the weights are chosen such that (Brillinger, 2002).Given the sample spectral matrix fy (ω) and sample variance V y , the estimate of the spectral envelope λ(ω) is the largest eigenvalue of ĥ(ω , and the optimal scaling, γ(ω), is the eigenvector of ĥ(ω) re associated with λ(ω).It should be noted that other approaches for nonparametric estimation of the spectral matrix, such as those in Dai and Guo (2004), Rosen and Stoffer (2007), and Krafty and Collinge (2013), can also be used.We use the kernel smoothing approach for computational efficiency and ease of theoretical exposition.

The Classification Methods
Consider a population of categorical time series composed of J ≥ 2 groups, Π 1 , . . .Π J .
Denote the jth group-level spectral envelope and (m−1)-variate scaling as Λ (j) (ω) and Γ (j) (ω) for j = 1, . . ., J respectively.Suppose we observe N = J j=1 N j independent training time series of length T and R independent testing time series of length T , X r = {X r1 , . . ., X rT }, r = 1, . . ., R, with unknown group membership.In this section, we introduce an adaptive algorithm for consistent classification.

Classification via the Spectral Envelope
As shown in Figures 2 and 3, groups of categorical time series may exhibit distinct oscillatory patterns.In this case, the spectral envelope, which characterizes dominant oscillatory patterns, can be used as a signature for each group and an important feature for categorical time series classification.We outline a classification procedure based on the spectral envelope below.
3. Classify time series X r to the group Π j with the most similar spectral envelope such that ĝr = arg min j D (r) j,EN V j = 1, . . ., J.
Classification consistency can be established under the following assumptions.To aid the presentation, we consider the case of J = 2 groups, Π 1 and Π 2 , while similar results can be derived for J > 2.
Assumption 1 Each element of the (m − 1) × (m − 1) spectral density matrix f y (ω) has bounded and continuous first derivatives.
Under Assumption 1, asymptotic consistency of the estimates λ(ω) and γ(ω) discussed in Section 2.3 can be established, and the largest eigenvalue of the spectral density matrix is continuous and bounded from above.Assumption 2 implies that the spectral envelopes of the two groups are well separated.The following theorem states the classification consistency of using the spectral envelope as a classifier.
Theorem 1 Under Assumptions 1-2, the probability of misclassifying X r , a testing time series from group Π 1 , to group Π 2 , can be bounded as follows: where 2,EN V are defined in (1).

Classification via Optimal Scalings
While the spectral envelope adequately characterizes dominant oscillatory patterns, it doesn't account for traversals through categories responsible for such oscillatory patterns.Differences among groups may also be due to different traversals through categories that produce particular oscillatory patterns, which are characterized by optimal scalings for each frequency component.Similarly, we present a categorical time series classifier using optimal scalings below.
3. Classify time series X r to the group Π j with the most similar set of optimal scalings such that ĝr = arg min j D (r) j,SCA j = 1, . . ., J.
In addition to Assumption 1, the following assumption is necessary to establish the classification consistency of the scaling classifier, which indicates that the optimal scalings are well separated.
Assumption 3 For fixed m categories, Theorem 2 states the consistency of classification based on the scalings.
Theorem 2 Under Assumptions 1 and 3, the probability of misclassifying X r , a testing time series from group Π 1 , to group Π 2 , can be bounded as follows: where 2,SCA are defined in (2).

Proposed Adaptive Envelope and Scaling Classifier
The envelope classifier (Section 3.1) works well in situations where oscillatory patterns are different among groups, while the scaling classifier (Section 3.2) is effective when traversals through categories are distinct among groups.However, in practice, different groups are likely to exhibit different oscillatory patterns and traversals through categories to some extent.
Thus, it is desirable to construct an adaptive classifier that can automatically identify the extent to which groups are different with respect to their oscillatory patterns, traversals through categories, or both, and optimally classify time series accordingly.To this end, we propose a general purpose, flexible classifier that adaptively weights differences in the spectral envelope and optimal scalings in order to determine the characteristics that best distinguish groups and provide accurate classification.Specifically, we consider the following distance of the rth testing time series to the jth group for j = 1, . . ., J and r = 1, . . ., R. Since the spectral envelope λ(r) is a K-dimensional vector and the scaling γ(r) is (m − 1) × K matrix, we rescale these distances by their corresponding norms.The unknown tuning parameter κ controls the relative importance of the spectral envelope and optimal scalings in classifying time series.Our proposed adaptive classification algorithm is presented in Algorithm 1.
Several remarks on the algorithm should be noted.First, the group-level spectral envelopes Λ (j) and optimal scalings Γ (j) are unknown in practice.We obtain Λ (j) and Γ (j) by averaging the sample spectral envelopes and sample optimal scalings across training time series replicates within the jth group, respectively.In particular, we replace Λ (j) and Γ (j) by their sample estimates for j = 1, . . ., J, where λ(j,k) and γ(j,k) are the estimated spectral envelope and optimal scalings of the kth training time series among group j, respectively.Second, we select the tuning parameter κ by using a grid search through leave-one-out (LOO) cross-validation.
Step 1: Use Leave-one-out cross validation to select tuning parameter κ.
Step 2: for r = 1, . . .R do Convert the testing time series X r with m categories into a (m − 1)-dimensional time series Y r defined in Section 2.2 and compute the (m − 1) × (m − 1) matrix ĥ(ω) in Definition 1; Compute the sample spectral envelope, λ(r) (ω s ), of the testing time series X r , where ω s = s/T are the Fourier frequencies with s = 1, . . ., K and as a K-dimensional vector; Compute the (m − 1)-dimensional sample optimal scalings, γ(r) (ω s ), of the testing time series X r , where ω s = s/T are the Fourier frequencies with s = 1, . . ., K and K = (T − 1)/2 .Denote In particular, let κ ∈ (0, 0.1, 0.2, . . ., 1).The estimated κ corresponds to the value that produces the highest leave-one-out classification rate via Algorithm 1.Although a finer grid could be used as well, in our experience, using κ ∈ (0, 0.1, 0.2, . . ., 1) performs well without sacrificing computational efficiency.Third, to obtain more parsimonious measures that still can discriminate among different groups, we may select a subset of elements in the spectral envelope and optimal scalings that are most different among groups.This strategy has been used in Fryzlewicz and Ombao (2009) for classifying nonstationary quantitative time series.
For example, we compute order ∆(s) decreasingly, and then choose the top proportion of the elements in ∆(s).A leave-one-out cross validation approach that minimizes the classification error is then used to select an appropriate proportion.
Under Assumptions 1 and 4, classification consistency is established in Theorem 3.
Assumption 4 For fixed m categories, Theorem 3 Under Assumptions 1 and 4, the probability of misclassifying X r , a time series from group Π 1 , to group Π 2 , can be bounded as follows: where D

Simulation Studies
We conduct simulation studies to evaluate performance of the proposed classification procedure.Following Fokianos and Kedem (2003), categorical time series X t are generated from the multinomial logit model as follows and , where Y t is a (m − 1)-dimensional time series which has a one in the th element if X t = c for = 1, . . ., m − 1 and zero elsewhere, p t for = 1, . . ., m are the probabilities of X t = c at time t and satisfy m =1 p t = 1, and α for = 1, . . ., m are the regression parameters.The simulated model incorporates a lagged value of order one of Y t or X t .We consider three different cases under the multinomial model.For the first two cases, we let the number of categories m = 4 and the number of groups J = 2.For Case 1, we consider the following regression parameters.
Figures 2(a) and 2(b) display realizations of time series from groups Π 1 and Π 2 in Case 1, respectively.For Case 2, the regression parameters are set to be Figures 2(c) and 2(d) present realizations of time series from groups Π 1 and Π 2 in Case 2, respectively.For Case 3, we consider J = 3 different groups with the following regression parameters 100 replications are generated for the 27 combinations of 3 cases, 3 numbers of time series per group in the training data, N j = 20, 50, 100 for all j, and 3 time series lengths T = 100, 200, 500.A test dataset of 50 time series per group is generated for each repetition to evaluate the out-of-sample classification performance.Four different methods are implemented: the proposed classifier which utilizes both the spectral envelope and optimal scalings (EnvSca), the classifier using the spectral envelope only (ENV), the classifier using the optimal scalings only (SCA), and the sequence learner classifier (SEQ) of Ifrim and Wiuf (2011).
Table 1 summarizes the means and standard deviations of the correct classification rates.
For Case 1, the proposed classifier and the envelope classifier perform similarly, and they both outperform sequence learner.The scaling classifier has classification rates around 50%, meaning that it is not better than a random guess.These results are unsurprising because Π 1 and Π 2 have different oscillatory patterns but similar traversals through categories, resulting in a poor classification rate if we use only the optimal scalings for classification.For Case 2, where the two groups are distinct mainly in the optimal scalings, the envelope classifier produces the lowest correct classification rate (around 50%) among all methods considered.
The proposed classifier and the scaling classifier perform similarly.They have slightly lower classification rates than sequence learner, which is designed to select and use all subsequences that are important in classifying responses and thus is well-suited for the setting in Case 2. In Case 3, we consider three groups, and groups differ in cyclical patterns and scalings.
The proposed classifier has higher mean classification rates than the envelope and scaling classifiers.This is because groups are different in both oscillatory patterns and traversals through categories.The proposed classifier, by incorporating both the spectral envelope and optimal scalings, can produce better classification rates in this case.It should be noted that sequence learner is developed under the framework of logistic regression and cannot classify a population of time series with more than two groups in its current form.One could extend sequence learner to multinomial logistic regression, but extensive programming efforts are needed and no prior results are available.Thus, we don't have simulation results for sequence learner in Case 3.
In addition to classification, estimates of the tuning parameter κ in the proposed algorithm allow for interpretable inference.For example, the average of estimated tuning parameters κ in our simulations for Cases 1, 2, and 3 are 1.00, 0.24, and 0.66, respectively.This suggests that κ can help us to identify whether groups are different in oscillatory patterns only, traversals through categories only, or a mixture of the two.The data for this analysis was collected through a study of various sleep-related disorders (Terzano et al., 2001) and is publicly available via physionet (Goldberger et al., 2000).All participants were monitored during a full night of sleep and their sleep stages were annotated by experienced technicians every 20 seconds according to well-established sleep staging criteria (Rechtschaffen and Kales, 1968).We consider classifying sleep stage time series data collected from NFLE and RBD patients, for which differential diagnosis is particularly challenging (Tinuper and Bisulli, 2017).NFLE and RBD patients both experience significant sleep disruptions associated with complex, often bizarre motor behavior (e.g.violent movements of arms or legs, dystonic posturing) and vocalization (e.g.screaming, shouting, laughing), which is due to nocturnal seizures for NFLE patients (Tinuper and Bisulli, 2017) and due to dream-enacting behavior in REM sleep for RBD patients (Schenck et al., 1986).This makes differentiating RBD and NFLE patients particularly challenging.
An objective, data-driven classification procedure that can automatically distinguish patients and aide differential diagnosis is needed.
The current analysis considers 8 hours of sleep stage time series from N = 46 participants: 34 NFLE patients and 12 RBD patients.This results in categorical time series of length T = 1440 with m = 6 sleep stages (REM, S1, S2, S3, S4, and Wake/Movement).Examples are provided in Figure 1.In order to estimate the spectral envelope and optimal scalings, Wake/Movement is used as the reference category.Leave-one-out (LOO) cross-validation is then used to empirically evaluate the effectiveness of the classification rule.For this data, the overall correct classification rate is 82.61%, with 29 of the 34 NFLE patients correctly classified and 9 of the 12 RBD patients correctly classified.The tuning parameter estimated via LOO cross-validation is κ = 0.852.This indicates that differences in spectral envelopes are relatively more important for accurately classifying members of each group compared to differences in optimal scalings for this data.
In addition to providing a classification rule for categorical time series, the estimated group-level spectral envelopes and optimal scalings (see Figure 4) provide insights into key differences in oscillatory patterns between the groups.For both groups, power is concentrated at lower frequencies (≤ 0.05) representing cycles lasting longer than 6.7 minutes and accounting for 87.4% and 84.6% of total power for the NFLE and RBD groups respectively.This is expected as longer sleep cycles tend to dominate sleep, with typical NREM-REM sleep cycles lasting between 70 to 120 minutes (Institute of Medicine, 2006).Accordingly, our analysis focuses on differences between groups among low frequencies.
First, the estimated spectral envelopes for the two groups (see Figure 4) are reasonably well-separated for frequencies below 0.02 (representing cycles longer than 16.7 minutes), with NFLE patients generally exhibiting more low frequency power than RBD patients.
This result is not completely unexpected, since RBD patients tend to wake up abruptly at the end of a dream-enacting episode and are alert (Foldvary-Schaefer and Alsheikhtaha,  Second, differences in optimal scalings (see Figure 4) are more subtle, with noticeable differences over some categories (e.g.S3, S4  It is important to note that the proposed classification rule automatically adapts to these particular features of the spectral envelopes and optimal scalings through the data-driven estimate of κ = 0.852 using LOO cross-validation, which assigns more weight to differences in spectral envelopes in distinguishing between the two groups.This is an important feature of the proposed classification procedure as it allows for the classification rule to adapt to differences between groups in the spectral envelope, optimal scalings, or both.

Discussion
This article presents a novel approach to classifying categorical time series.An adaptive algorithm that utilizes both the spectral envelope and its corresponding set of optimal scalings for classification of categorical time series is developed.Classification consistency is also established.We conclude this article by discussing some limitations and related future extensions.First, the proposed method assumes that the collection of time series is stationary.Proofs of Lemma 1 and 2 are straightforward from (Brillinger, 2002, Theorems 9.4.1 and 9.4.3) and thus omitted.
s ) 2 is of order T 2 .Combine these results we have I = O(B 2 T T −2 ).Similarly, using Lemma 1 and Assumption 2, we have II = O(B 2 T T −2 ).Thus, complete the proof.

Figure 1 :
Figure 1: Sleep stage time series from six sleep study participants: three NFLE patients (top row) and three RBD patients (bottom row).

Figure 2 :
Figure 2(c) spends approximately equal amounts of time in each category, while the time series in Figure 2(d) spends more time in categories 2 and 3.Moreover, Figure 3 displays the estimated spectral envelope for the two series in Figures 2(a) and 2(b) and the optimal scal- The spectral envelope characterizes important oscillatory patterns in categorical time series.For illustration, Figures 3(a) and 3(b) display the estimated spectral envelopes for time series displayed in Figures 2(a) and 2(b) respectively.It can be seen that the time series in Figure 2(a), which oscillates more slowly than the time series in Figure 2(b), has more power in the estimated spectral envelope at lower frequencies.The set of optimal scalings that maximize the normalized spectral density at frequency ω, B(ω), provides important information about the traversals through categories associated with prominent oscillatory patterns at frequency ω.For further illustration, Figures 3(c) and 3(d) display the estimated optimal scalings for time series displayed in Figures 2(c) and 2(d) respectively.The optimal scalings in Figure 3(d) for categories 2 and 3 are similar at lower frequencies (ω < 0.2), but the optimal scalings in Figure 3(c) for categories 2 and 3 are different at lower frequencies.This is because the corresponding time series in Figure 2(d) visits categories 2 and 3 more frequently than the time series in Figure 2(c).

Figure 3 :
Figure 3: (a) and (b): The spectral envelopes of the time series shown in panels (a) and (b) of Figure 2; (c) and (d): The scalings of the time series presented in panels (c) and (d) of Figure 2.
During a full night of sleep, the body cycles through different sleep stages, including rapid eye movement (REM) sleep, in which dreaming typically occurs, and non-rapid eye movement (NREM) sleep, which consists of four stages representing light sleep (S1,S2) and deep sleep (S3,S4).These sleep stages are associated with specific physiological behaviors that are essential to the rejuvenating properties of sleep.Disruptions to typical cyclical behavior and changes in the amount of time spent in each sleep stage have been found to be associated with many sleep disorders (Zepelin et al., 2005; Institute of Medicine, 2006).Particular sleep disorders, such as nocturnal frontal lobe epilepsy (NFLE), are also difficult to accurately diagnose since clinical, behavioral, and electroencephalography (EEG) patterns for NFLE patients are often similar to those of patients with other sleep disorders, such as REM behavior disorder (RBD)(D'Cruz and Vaughn, 1997;Tinuper and Bisulli, 2017).Accordingly, there is a need for statistical procedures that can automatically identify cyclical patterns in sleep stage time series associated with specific sleep disorders and accurately classify patients with different sleep disorders.
2013), which can disrupt typical sleep cycles and reduce the prominence of low frequency oscillations.On the other hand, NFLE patients do not typically wake up immediately following a nocturnal seizure (Foldvary-Schaefer and Alsheikhtaha, 2013).The contrasting effects are also reflected in the data, in which RBD patients spend nearly twice as much time in the Wake/Movement stage during the night on average compared to NFLE patients (61.4 minutes vs. 32.1 minutes).

Figure 4 :
Figure 4: Left: Estimated spectral envelope for NFLE patients (solid red) and RBD patients (dashed blue) for low frequencies (below 0.05).Group-level estimated spectral envelopes are represented by the two thicker lines.Right: Estimated optimal scalings for NLFE patients (top) and RBD patients patients (bottom) for low frequencies (below 0.05).

Figure 5 :
Figure 5: Top: Sample time series from the NFLE and RBD groups.Bottom: Corresponding scaled time series based on the mean scaling for frequencies below 0.025 (i.e.cycles lasting more than 13 minutes).Color corresponding to NREM (purple), REM (blue) and W/MT (yellow) sleep stages also provided.

Table 1 :
Mean (standard deviation) of the percent of correctly classified time series across methods.
Figure5provides a sample series from each group along with the scaled time series obtained by averaging optimal scalings over frequencies below 0.025.Given the propensity for RBD patients to experience immediate sleep disruptions more so than NFLE patients, it is not surprising that RBD patients experience less deep sleep than NFLE patients.